High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis

نویسندگان

چکیده

Summary In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account variation in sequencing depth, classic log-contrast model is often used where read counts normalized into compositions. However, zero randomness covariates remain critical issues. We introduce surprisingly simple, interpretable efficient method estimation through lens novel high-dimensional log-error-in-variable model. The proposed provides corrections on possible overdispersion simultaneously avoids any subjective imputation counts. provide theoretical justifications matching upper lower bounds error. merit procedure illustrated real analysis simulation studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

Analysis of High Dimensional Compositional Data Containing Structural Zeros with Applications to Microbiome Data

This paper is motivated by the recent interest in the analysis of high dimensional microbiome data. A key feature of this data is the presence of ‘structural zeros’ which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are insufficient to model these structural zeros. We define a general ...

متن کامل

Variable Selection in Log - linear Birnbaum - Saunders Regression Models for High - dimensional Survival Data

Birnbaum-Saunders (BS) distribution is broadly used to model failure times with reliability and survival data. In this thesis, we propose a simultaneous parameter estimation and variable selection procedure in a log-linear BS regression model for high-dimensional survival data. We introduce a path-wise algorithm via cyclical coordinate descent method based on the elastic-net penalty. To deal wi...

متن کامل

Binary Regression With a Misclassified Response Variable in Diabetes Data

Objectives: The categorical data analysis is very important in statistics and medical sciences. When the binary response variable is misclassified, the results of fitting the model will be biased in estimating adjusted odds ratios.  The present study aimed to use a method to detect and correct misclassification error in the response variable of Type 2 Diabetes Mellitus (T2DM), applying binary ...

متن کامل

High Dimensional Variable Selection with Error Control

Background. The iterative sure independence screening (ISIS) is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false discovery rate (FDR). We propose to use the FDR as a screening method to reduce the high dimension to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Biometrika

سال: 2021

ISSN: ['0006-3444', '1464-3510']

DOI: https://doi.org/10.1093/biomet/asab020